水下成像是海洋机器人执行的一项关键任务,用于广泛的应用,包括水产养殖,海洋基础设施检查和环境监测。但是,水柱的影响(例如衰减和反向散射)会大大改变捕获的水下图像的颜色和质量。由于水条件的变化和这些影响的范围依赖性,恢复水下图像是一个具有挑战性的问题。这会影响下游感知任务,包括深度估计和3D重建。在本文中,我们推进了神经辐射场(NERFS)的最先进,以实现物理信息密集的深度估计和颜色校正。我们提出的方法Waternerf估计了水下图像形成的基于物理的模型的参数,从而导致混合数据驱动和基于模型的解决方案。在确定了场景结构和辐射场之后,我们可以产生降级和校正的水下图像的新颖观点,以及场景的密集深度。我们对实际水下数据集进行定性和定量评估所提出的方法。
translated by 谷歌翻译
Training self-driving cars is often challenging since they require a vast amount of labeled data in multiple real-world contexts, which is computationally and memory intensive. Researchers often resort to driving simulators to train the agent and transfer the knowledge to a real-world setting. Since simulators lack realistic behavior, these methods are quite inefficient. To address this issue, we introduce a framework (perception, planning, and control) in a real-world driving environment that transfers the real-world environments into gaming environments by setting up a reliable Markov Decision Process (MDP). We propose variations of existing Reinforcement Learning (RL) algorithms in a multi-agent setting to learn and execute the discrete control in real-world environments. Experiments show that the multi-agent setting outperforms the single-agent setting in all the scenarios. We also propose reliable initialization, data augmentation, and training techniques that enable the agents to learn and generalize to navigate in a real-world environment with minimal input video data, and with minimal training. Additionally, to show the efficacy of our proposed algorithm, we deploy our method in the virtual driving environment TORCS.
translated by 谷歌翻译
与标准动态范围(SDR)视频相比,高动态范围(HDR)视频可以代表更大的亮度和色彩范围,并且正迅速成为行业标准。与传统SDR视频相比,HDR视频具有更具挑战性的捕获,传输和显示要求。凭借其更大的深度,高级的电流传输功能以及更广泛的颜色范围,因此需要专门设计用于预测HDR视频质量的视频质量算法。为此,我们介绍了HDR视频的首次公开发布的大规模主观研究。我们研究扭曲的影响,例如压缩和混叠对HDR视频质量的影响。我们还通过在黑暗实验室环境和更明亮的客厅环境中进行研究来研究环境照明对HDR视频感知质量的影响。总共有66名受试者参加了这项研究,并收集了20,000多个意见分数,这使得这成为有史以来最大的HDR视频质量研究。我们预计,该数据集将成为研究人员为HDR视频开发更好的感知质量模型的宝贵资源。
translated by 谷歌翻译
最近,蒙面图像建模(MIM)由于其能力从大量未标记的数据中学习而引起了人们的关注,并且已被证明对涉及自然图像的各种视觉任务有效。同时,由于未标记的图像的数量高,预计3D医学图像中的自我监督学习的潜力预计将是巨大的,以及质量标签的费用和困难。但是,MIM对医学图像的适用性仍然不确定。在本文中,我们证明了掩盖的图像建模方法还可以推进3D医学图像分析,除了自然图像。我们研究掩盖图像建模策略如何从3D医学图像分割的角度利用性能作为代表性的下游任务:i)与天真的对比度学习相比,蒙版的图像建模方法可以加快监督培训的收敛性,甚至更快(1.40美元$ \ times $ \ times $ $ $ )并最终产生更高的骰子分数; ii)预测具有较高掩盖比和相对较小的贴片大小的原始体素值是用于医学图像建模的非平凡的自我监督借口任务; iii)重建的轻质解码器或投影头设计对于3D医学图像上的掩盖图像建模非常有力,该图像加快了训练并降低成本; iv)最后,我们还研究了在不同的实际情况下使用不同图像分辨率和标记的数据比率的MIM方法的有效性。
translated by 谷歌翻译
近年来,视觉问题应答(VQA)在近年来,由于了解来自多种方式的信息(即图像,语言),近年来近年来在近年来的机器学习社区中获得了很多牵引力。在VQA中,基于一组图像提出了一系列问题,并且手头的任务是到达答案。为实现这一目标,我们采用了一种基于象征的推理方法,使用正式逻辑框架。图像和问题被转换为执行显式推理的符号表示。我们提出了一种正式的逻辑框架,其中(i)图像在场景图的帮助下将图像转换为逻辑背景事实,(ii)问题被基于变压器的深度学习模型转换为一阶谓词逻辑条款,(iii)通过使用背景知识和谓词条款的接地来执行可靠性检查,以获得答案。我们所提出的方法是高度解释的,并且可以通过人容易地分析管道中的每个步骤。我们验证了我们在CLEVR和GQA数据集上的方法。我们在Clevr DataSet上实现了99.6%的近似完美的准确性,可与艺术模式相当,展示正式逻辑是一个可行的工具来解决视觉问题的回答。我们的模型也是数据高效,在仅在培训数据的10%培训时,在缩放数据集中实现99.1%的准确性。
translated by 谷歌翻译
In recent years, deep neural networks have emerged as a dominant machine learning tool for a wide variety of application domains. However, training a deep neural network requires a large amount of labeled data, which is an expensive process in terms of time, labor and human expertise. Domain adaptation or transfer learning algorithms address this challenge by leveraging labeled data in a different, but related source domain, to develop a model for the target domain. Further, the explosive growth of digital data has posed a fundamental challenge concerning its storage and retrieval. Due to its storage and retrieval efficiency, recent years have witnessed a wide application of hashing in a variety of computer vision applications. In this paper, we first introduce a new dataset, Office-Home, to evaluate domain adaptation algorithms. The dataset contains images of a variety of everyday objects from multiple domains. We then propose a novel deep learning framework that can exploit labeled source data and unlabeled target data to learn informative hash codes, to accurately classify unseen target data. To the best of our knowledge, this is the first research effort to exploit the feature learning capabilities of deep neural networks to learn representative hash codes to address the domain adaptation problem. Our extensive empirical studies on multiple transfer tasks corroborate the usefulness of the framework in learning efficient hash codes which outperform existing competitive baselines for unsupervised domain adaptation.
translated by 谷歌翻译